"Skynet - Machine Learning with Satellites and OpenStreetMap data." By: Anand Thakker. >> Okay. Next up is Anand Thakker. Go ahead. >> Hello. Hi, can you hear me in the back? Good. I'm Anand Thakker, I work at Development Seed in DC, and going to pick up where Kevin left off talking about satellite imagery as a dataset, not just a picture. And before I go any further, I have a confession to make is that I've never seen a single terminator movie so all of this Skynet stuff relate to I've been hashtagging is me being a total poser, but I think that's appropriate because along the Skynet, I know almost nothing everything else on that screen other than maybe my name. I'm a amateur at remote sensing and my number of OSM edits is less than five I think, so shame. But I'm hoping to up that number quite a bit pretty soon. So that's all just to say maybe that if I have one goal for this talk, it is to convey that all this stuff, this machine learning stuff is something that amateurs like me and maybe you can do and so I'm excited about the prospect of having this community kind of bring that in into what we do as, you know, as do it yourself making the world better in terms of. Oh, one more preliminary. Image credits any satellite or aerial looking imagery you see here is all from Mapbox imagery tiles either N AIP or DigitalGlobe. So let me say a little bit about why. I think some of us may be not particularly new ideas here but not why I thought doing machine learning with satellite imagery and OpenStreetMap data is a good idea. So sort of three answers. First answer is -- excuse me, a minute. I have notes on this computer. First answer is that the kinds of stuff that we want to do with machine learning, so using it with satellite imagery -- >> Hold the mic up. >> Oh, using it as a dataset. Is a lot of it is hard. So classifying images, extracting features from images, it's hard. And certain class of machine learning algorithms particularly deep learning seems to love these problems, especially in recent years, it's just leading these problems. So this is from a paper a few years ago by some researchers what they found is that road detection while it sort of works in rural areas and things like that, it's not like there. Like, we don't have road detection just happening in production. We didn't then. And from what I can tell, that hasn't changed all that much. I would love to be wrong. So that's the problem being hard. Here's the problem being easy for machine learning. So this is for Google a couple of years ago, the image net machine learning challenge collection challenge. Generally speaking computer vision problems seem to be quite tractable for deep learning, visual learning models. So this is the reason why. Because it can. Because machine learning can handle this stuff, that's one reason we should do it. Another reason we should do it with OSM data is that putting OSM together with satellite imagery makes a really great fit for machine learning. And to say why, I'm going to say a little bit about why what machine learning actually is. So if you kind of already know this, bear with me a little bit. And if you don't, then also bear with me, because this will be a totally inadequate explanation. So what is it? Okay. And I'm going to talk about a particular type of machine learning. If you want to have a broader -- sorry. If you want a sort of broader look at how -- what it looks like to do machine learning, check out Stuart Lynn's talk. It's a really great examples and resources. But supervised learning. So basically what you have is a model. Okay? And a model is a black box. Okay? It's it takes some kind of input and gives you an output but you don't really know too much about what's going on inside. It has some nobody's and dials that change what's going on inside, but you don't really know what those do. That's your model. And then you have training data and what training data is a bunch of example inputs, and for each of those examples, the true or expected thing that you want from the model. So that's training. And generally it's, like, reasonably large, but it's not as large as all the data you want to deal with. So, for example, you might have, you know, images. So that picture would be the input and then these boxes with the correct description or whatever would be the output. Or you might have, like, recordings people talking and the correct textual transcription is the input and the expected output. Okay? And then so you have that. And then what you do is you train it, and it's a sort of weirdly simple idea. Some of the details are complicated, but that's for the machine learning researchers to deal with. For us, it's a pretty simple idea. You take your model, this black box. It starts out totally random, and you just apply it to the inputs, and obviously it's just wrong; right? Because it doesn't know anything. So it just gets the wrong answer for everything, like. Then you compare those wrong answers to what you expect. And you get sort of a calculated error. And then based on that error, you just tweak the model a little bit. So, for instance, if the outputs were numbers and let's say you did it the first time and the numbers were way too high, then you would just slightly adjust the numbers to kind of try steer the model in the direction of giving you lower numbers, and then you try again. That's the simplification. But that's basically what's happening and when I say you're doing this, really your script is doing this. The library that you're using to do it is doing for you. So this is what's happening when you train the model. So with that description of machine learning, that's -- so the second reason that I think this is a good idea is that with satellite imagery as the input and OpenStreetMap as a potential source of sort of annotations or knowledge about that imagery, we have a really great potential source of training data or multiple sources of training data. So that's answer two. Answers one and two are really about why we could do this. But I guess maybe a bigger question is why we would do it. Why would we want to? Or why we should do it. But I sort of also think that that's probably not -- that's probably not one that needs much explanation here; right? Why should we automatically pull road features out of satellite images? Well, because we can improve the map, and we can use that additional insight to improve how we use OSM to improve the world. So we can improve the map by -- if we can automatically know where the worlds are, then we can prioritize tracing candidates. Maybe we can even initialize the new tracing that we want to do with some geometries that a person can go fix. Maybe that's easier than tracing it from scratch. Maybe we can even have, get suggested additional tags or metadata for existing features; right? So these are pretty clear ways that we could improve the map if we could do some automated abstraction. And then improving the world if we're using OpenStreetMap to help find people in a crisis or find people that are not connected or something like that, that's great. Getting people out there to improve the map when we need it is obviously something we're always willing to do. But in the meantime while we're waiting to get that map filled in, why not have the computer give us, like, slightly imperfect but more comprehensive data to work off of? You know. So okay. So that's why. Before I dive into some of the experiments that I've been doing and some of the failures and results that I've been getting, let me just say a couple of things about some machine learning work that -- some similar work that I see other people doing recently. I'm sure this is not all of it, it's just the stuff that comes across. One example, you may have seen this. It's actually by Andrew Johnson I think who's in the audience here. So this is really cool. It's using your own network to -- and sorry if I get this slightly wrong. But I think what it's doing is using a network to determine the likelihood a particular spot in the map has the road miss registered. So the model think so that there is a road -- maybe there isn't a road but there is a road and so that's suggesting that something's wrong. And what's really cool is that this is built -- built this into a working live site that actually ranks, like, scenes and spots by the likelihood of error and then gives you a way to -- gives you a link to fix them in the idea editor. And there it is when you get to my second OSM. Another example you may seen is tera pattern, tera pattern is a different and quite creative idea. This is a team that used -- that trained a deep neural network to essentially classify it to some of the metadata that is in that scene or in that spot. But the way that they use that is very interesting. They built this app where you click this spot and what it does is it looks at how the neural network classifies that spot and then searches for any other imagery in that city that is classified in basically the same way, where the model think so that it has the same categories. And the result is that you get this really cool visual search. So you click on this plot, and you get a lot of other places. So you clicked on a bridge, and you get all of these other bridges. And it's not just doing a visual comparison. It's actually using the models, quote, unquote, insights. So that's really cool. Okay. Skynet experiments. So my sort of big goal that I wanted to, like, experiment to words was just tracing roads. Much like the Facebook talk some of you may have seen, I thought, hey, it would be cool to get roads automatically added with this. So what I did. So I found this model that -- this paper that some researchers at the University of Cambridge released late last year. It's a model segment and here's what it does. You can see it. So the top is the inputs and the middle is the ground truths and the bottom is what segment think so or sees in the image. And so I saw this and I was, like, whoa that's cool. It's finding toilet; right? And you can see the couch. So all I need is -- oh, and they released this model, open source the code that they used to train it and also even the final training models. So you could actually just grab it and use it right off the bat with no training. Which is great. So, yeah, so there is -- I figured, well, if they've done the model, they've done the hard part because they know machine learning. So all I need is training data that's analogous with satellite imagery and maybe we can find some roads. So that's the first thing I did. I produced a pipeline to produce data imagery easily. You have to send stuff through the imagery in small patches, and it turns out that a lot of -- there's a natural format for getting satellite imagery and OSM imagery in patches and that is tiles. So these scripts, which you can use grab tiles from an area from Mapbox satellite, and they grab the OSM QA tiles, and then use Mapnik to render them into ground images that look like these ones. So here's an example of that; right? That's a Mapbox satellite tile and then there's the rendered ground truth image which in this case I just looking at the rendering OSM loads that are in that location. And this is also actually an example of the first -- oops. The first mistake that I made, which is that I rendered the ground truth at one pixel width. And what I found -- I mean, like, three or four training sets like this. And what I found when I tried to train the model is that it just could not do it. It couldn't learn to draw these superfine lines. Maybe they were miss registered or something like that. So it took Anne me an embarrassing long time to realize that that was probably the issue and -- but luckily the training that started data prep scripts that I built kind of gave me an easy way to change it. So this is how I declared how I wanted the ground troops data to be generated in OSM. You can see the filter there. All I had to do was change the one to a five and regenerate. And then I finally started getting some interesting results. So here's the -- here's a few examples of -- of the new data with the larger ground truth images there. They're a little bit wider. And -- oh, I should say that the images you're seeing here and the rest that you'll see in this presentation are -- these are not actually from the trimming data because an important thing in machine learning is that you have to not -- is that -- oh, shit -- [Laughter] >> Five minutes for questions if you want to keep talk. >> Keep going. >> I'm much more interested in your questions than anything I say. [Laughter] >> It's lunch. >> Let's go. >> Okay. I'll skip a few things. So basically these are not the images that it was trained with because you don't want to know whether to memorize the answers, you want to know whether to learn something. So this is the first set that did anything. Here is after a few hours of training. Pretty cool. Seeing some stuff but it does not learn the scenario things, it's, like, this weird cloudy stuff. But, hey, I think it might have found a road that wasn't in OSM so that's promising; right? Here it is after a few more hours with I think a day of training, it has gotten a little bit cleaner; right? But the problem is that this was trained with a random sample across the whole U.S., which means basically all rural. Because if you throw a Dart at the U.S., you don't hit cities; right? I through it at Seattle, and I was, like, okay. I'm going to Seattle, let's retrain with just images from Seattle. Here's some of those. New training set just in Seattle, much better; right? So this is after, again, just, like, three or four five hours and then these on the very right are looking pretty good and that's after about a day and a half of training. Day and a half of training of GPU instance is, like, 20 bucks or something. I don't know. Not too bad. Interesting, here's a way doing pretty well. Here's the way that it fails. It think so lanes in parking lots are roads. >> They are. [Laughter] >> Yeah, if that's what you want, then you can keep it at that. If you want it to learn a little bit better, we can use polygons around parking lots in the training data so that it could learn the difference. On the other hand here's something pretty exciting. This is maybe the most exciting thing to me. Check out this road here where you have this deep shadow across the road; right? The model just doesn't even care. It's just, like, oh, shadow? Whatever. [Laughter] I guess I'll just say a final thing here is that one of the problems I had is how do I assess how well my model dip? Because the training I'm using is from OpenStreetMap so the best I could do is say how correct is it to compare to OpenStreetMap but when OSM is missing a road, then the model gets counted off, poor guy, for doing something good; right? Here's a guy that's not in OSM, and it's getting a worst correctness. So that's something that I -- that's one way we can really improve this I think is by getting some place where we know we have really complete data and using that for training. This is where I'm supposed to do a demo. Let me stop for questions, though, and who knows if this demo is going to work, and I want to take at least one or two. >> I have a simple question. Can this training system be used to detect something that's not as thin as a road like perhaps a pool or something? >> Yeah. Good question. I haven't tried it on too much else because I just haven't had time yet. This has been between projects that I've been doing this. But I did start trying it on buildings, and it looked like it started working. But wherever I was trying it, the OSM building that it was really incomplete and that made the model -- made it really hard for the model to learn because it kept getting negative feedback; right? So with the right data and the right place, yes, definitely. >> Thanks have you tried this much with rural areas that are unpaved, like, Forrest roads or trails or anything like that? >> No. But it's, like -- that's actually the reason I want to do this is I'm not that interested in automatically tracing roads in places like the U.S. where we have a lot of resources to do it. I think similar to what Facebook folks were saying, you know, this is the most interesting when we can use it to help us get places that are not mapped, mapped better. So this model does not perform well if you try to -- does not perform well if you take it to, like, Egypt or something like that. But I think if we train it on places la at that look sort of similar, then it really can. I think this shows that we can make that happen. So it's just a matter of getting the right training data. >> Do you think this could be used to learn the direction of the road? Like, looking at the angle, where the cars are heading and stuff like that? >> Yeah. The direction of the cars, that's an interesting point. I thought about maybe adding telemetry data in and somehow adding that together to do training. But even the direction of the car might be interesting. >> Hi, I was wondering if you needed help with this? >> Yes. >> Let's talk later. >> Yes. It's all on GitHub and the contributions are very, very welcome. We're also hiring. By the way, I clicked a few times. That -- this is actually the model is live on the server and when I click, it's sending the image up and getting -- this is not canned, it's getting the live prediction back. And it's not as fast as I want it to be, but -- [Applause] >> Let's go eat lunch. >> I have a quick question. >> One more. >> OpenStreetMap data. Data add to the map. >> Oh, are you saying that we could use -- we could clean up the output and make -- yeah, and actually I really want to -- I also want to get -- another next step is I want to vectorize the output from this and get a clean vector image that could actually be used as initial trace. >> So once you're recognizing the reds, is that being converted into a line or the shape of the red itself? >> As of now, it's being converted to neither. Literally that's just an image overlay because that's what the -- because really what this neural network is doing is image transformation. It's a crazy one; right? It's just doing a bunch of Matrix math on the and images somehow taking this very interesting complex image, the actual image, and it's producing this yellow. So I think it's vectorizing something like that -- now that the lines are coming out pretty line, vectorizing it is not that hard. Back in those cloudier, weird earlier images, when I first started getting those, I was, like, yeah, we're never going to be able to vectorize this, but now that it's crisp, I think it's doable, and it sounds like the Facebook folks have it working. So I would use that. Okay. Let's go eat. [Applause]